HiBench: A Representative and Comprehensive Hadoop Benchmark Suite
نویسندگان
چکیده
MapReduce and its popular open source implementation, Hadoop, are moving toward ubiquitous for Big Data storage and processing. Therefore, it is essential to quantitatively evaluate and characterize the Hadoop deployment through extensive benchmarking. In this paper, we present HiBench [1], a representative and comprehensive benchmark suite for Hadoop, which consists of a set of Hadoop programs including both synthetic micro-benchmarks and real-world applications. Currently the benchmark suite contains eleven workloads, classified into four categories, as shown in Table I.
منابع مشابه
Benchmarking and Performance studies of MapReduce / Hadoop Framework on Blue Waters Supercomputer
MapReduce is an emerging and widely used programming model for large-scale data parallel applications that require to process large amount of raw data. There are several implementations of MapReduce framework, among which Apache Hadoop is the most commonly used and open source implementaion. These frameworks are rarely deployed on supercomputers as massive as Blue Waters. We want to evaluate ho...
متن کاملMRBS: A Comprehensive MapReduce Benchmark Suite
MapReduce is a promising programming model for distributed data processing. Extensive research has been conducted on the scalability of MapReduce, and several systems have been proposed in the literature, ranging from job scheduling to data placement and replication. However, realistic benchmarks are still missing to analyze and compare the effectiveness of these proposals. To date, most MapRed...
متن کاملLNCS 7640 - Euro-Par 2012: Parallel Processing Workshops
MapReduce is a popular programming model for distributeddata processing. Extensive research has been conducted on the reliability of MapReduce, ranging from adaptive and on-demand fault-tolerance tonew fault-tolerance models. However, realistic benchmarks are still miss-ing to analyze and compare the effectiveness of these proposals. To date, most MapReduce fault-tolerance solutions...
متن کاملHadoop Based Data Intensive Computation on IaaS Cloud Platforms
............................................................................................................................. xi Chapter 1: Introduction ....................................................................................................... 1 1.1 Cloud Platforms ........................................................................................................ 2 1.1.1 Amazo...
متن کاملPerformance Benefits of DataMPI: A Case Study with BigDataBench
Apache Hadoop and Spark are gaining prominence in Big Data processing and analytics. Both of them are widely deployed on Internet companies. On the other hand, high-performance data analysis requirements are causing academical and industrial communities to adopt state-of-the-art technologies in HPC to solve Big Data problems. Recently, we have proposed a key-value pair based communication libra...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012